Tetris: Experiments with the LP Approach to Approximate DP
نویسندگان
چکیده
We study the linear programming (LP) approach to approximate dynamic programming (DP) through experiments with the game of Tetris. Our empirical results suggest that the performance of policies generated by the approach is highly sensitive to how the problem is formulated and the discount factor. Furthermore, we find that, using a state-sampling scheme of the kind proposed in [7], the simulation time required to generate an adequate number of constraints far exceeds the time taken to solve the resulting LP. As an extension to the standard approximate LP approach, we examine a bootstrapped version wherein a sequence of LPs is solved, with the policy generated by each solution being used to sample constraints for the next LP. Our empirical results demonstrate that this bootstrapped approach can amplify performance.
منابع مشابه
The Smoothed Approximate Linear Program
We present a novel linear program for the approximation of the dynamic programming costto-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural ‘projection’ of a well studied linear program for exact dynamic programming. Such programs restrict attention to approximations that are lower bounds to the optimal cost-to-go fun...
متن کاملApproximate Dynamic Programming via a Smoothed Linear Program
We present a novel linear program for the approximation of the dynamic programming costto-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural ‘projection’ of a well studied linear program for exact dynamic programming. Such programs restrict attention to approximations that are lower bounds to the optimal cost-to-go fun...
متن کاملA Smoothed Approximate Linear Program
We present a novel linear program for the approximation of the dynamic programming cost-to-go function in high-dimensional stochastic control problems. LP approaches to approximate DP naturally restrict attention to approximations that are lower bounds to the optimal cost-to-go function. Our program – the ‘smoothed approximate linear program’ – relaxes this restriction in an appropriate fashion...
متن کاملApproximate modified policy iteration and its application to the game of Tetris
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are exten...
متن کاملA learning algorithm based on $λ$-policy iteration and its application to the video game "tetris attack"
We present an application of the λ -policy iteration, an algorithm based on neuro-dynamic programming (described by Bertsekas and Tsitsiklis [BT96]) to the video game Tetris Attack in the form of an automated player. To this end, we ®rst introduce the theoretical foundations underlying the method and model the game as a dynamic programming problem. Afterwards, we perform multiple experiments us...
متن کامل